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^vq , Abstract 

^ ' This paper provides proofs of the rate stabiHty, Harris recurrence, and e-optimahty of 

CSMA algorithms where the backoff parameter of each node is based on its backlog. These 
I algorithms require only local information and are easy to implement. 

The setup is a network of wireless nodes with a fixed conflict graph that identifies pairs 
of nodes whose simultaneous transmissions conflict. The paper studies two algorithms. The 
first algorithm schedules transmissions to keep up with given arrival rates of packets. The 
^ ' second algorithm controls the arrivals in addition to the scheduling and attempts to maximize 

the sum of the utilities of the flows of packets at the different nodes. For the flrst algorithm, 
the paper proves rate stability for strictly feasible arrival rates and also Harris recurrence of 
^ ■ the queues. For the second algorithm, the paper proves the e-optiniality. Both algorithms 

, operate with strictly local information in the case of decreasing step sizes, and operate with 

the additional information of the number of nodes in the network in the case of constant 
step size. 
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Q\ ■ 1 Introduction 
p ■ 

The problem of scheduling and controhing congestion in networks with conflicting nodes has 
^ I received considerable attention over the last few years for communication networks, stochastic 

?H ' processing networks (cf. [22], |21j ) and switched networks (cf. |50j). 

Chronologically, the major steps are efficient random access algorithms, the stability of 
maximum weight scheduling (MW), randomized versions of MW, greedy algorithms with good 
throughput properties, and optimal local algorithms. 

A number of efficient random access algorithms for scheduling transmissions of nodes were 
proposed, starting with the classical ALOHA protocol [H [38]. Hajek and van Loon [2^ first 
showed that an adaptive version of ALOHA achieves the maximum throughput possible for that 
network. Works by Kelly and McPhee [29l [281 [35] , Mosely and Humblet |l3], Tsybakov and 
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Likhanov [55], Aldous [2], Hastad, Leighton and Rogoff [23], Goldberg et al. [T7] establish vari- 
ous negative and positive results about the setup when time is slotted, packets are unit size and 
packets may be queued or not queued. These papers assume that the nodes do not sense the 
transmission of other nodes. For an online survey (until October 2002) of contention resolution 
without carrier sense, see [18]. More recently, Gupta and Stolyar [19] and Stolyar [52] proposed 
algorithms that can achieve the capacity of slotted ALOHA by dynamically adjusting the ac- 
cess probabilities. Another class of random access algorithm is CSMA (Carrier Sense Multiple 
Access). For example, Eryilmaz, Marbach and Ozdaglar [36] showed that with a particular 
interference model ("primary interference model"), properly choosing the access probabilites in 
CSMA can achieve the maximum throughput in the asymptotic regime of small sensing delay 
and large networks. A related work by Bordenave, McDonald and Proutiere [1] analyzes the 
'capacity' of large network (or mean field limit) for a given set of access probabilities. 

The MW algorithm was proposed by Tassiulas and Ephremides [5l]. This algorithm sched- 
ules the independent set (non-conflicting nodes) with the maximum sum of queue lengths. These 
authors show that the sum of the squares of the queue lengths is a Lyapunov function, thus 
proving stability. Variants of this algorithm have good delay properties (cf. Shah and Wischik 
[49l[50]). Unfortunately, finding the MW independent set is NP-complete, making such algo- 
rithms difficult to implement. The central idea of considering the maximization of the sum of 
the user utilities is due to [3^. See also [Ml SO]. Combining this objective with the scheduling 
appears in [S] |l5] as well as [121 [13] . For a related survey, see [HI H] . 

Randomized versions of MW by Tassiulas [53] and its variant by Giaccone, Prabhakar and 
Shah [16] provide a simpler (centralized) implementation of MW for input-queued switches 
while retaining the throughput property. A distributed implementation of this algorithm based 
on distributed sampling and distributed (a la gossip, cf. Shah [38]) summation procedure was 
proposed by Modiano, Shah and Zussman [Jl]. This algorithm, though simple and distributed, 
requires network-wide information exchange for each new scheduling decision. To overcome 
this limitation, Rajagopalan, Shah and Shin |46j proposed a distributed, simple, throughput 
optimal algorithm that is Markovian in which each node exchanges exactly one message/number 
(through broadcast transmission) with its neighbor at each time. This algorithm tries to design 
a reversible Markovian algorithm based on a Metropolis and Hasting's method that solves a 
network-wide optimization problem inspired by MW algorithm. The choice of weight of a 
queue as an appropriate function of the queue-size plays key role in establishing the throughput 
optimality. Authors conjecture that a simplified version of their algorithm that performs no 
information exchange is throughput optimal. The conjecture, as this paper is written, remains 
unresolved. An interested reader can find a summary of design and analysis of MW-based 
scheduling algorithms (till 2007) for switched networks in a book chapter by Shah [47]. 

Greedy algorithms are simpler than MW. Parallel Iterative Matching [3] and iSLIP [37] were 
shown to be 50% throughput optimal [9|. Subsequently, Dimakis and Walrand [11] identified 
sufficient conditions on the network topology for throughput optimality. Those conditions were 
further weakened to obtain fractional throughput results about a class of wireless networks by 
Joo, Lin and Shroff |27j and Leconte, Ni and Srikant |31| . These algorithms are generally not 
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throughput optimal and require multiple rounds of message exchanges among nodes. 

A class of local algorithms was proposed by Jiang and Walrand [26]. The algorithms adjust 
access probabilities in CSMA for both scheduling and congestion control by means of a novel 
optimization problem and its relation to certain reversible networks. The result is a totally 
distributed algorithm. They conjecture it to be throughput optimal and utility maximizing 
for scheduling and congestion control respectively. In [25], the authors use a suggestion by 
Shah to adjust the rates over increasing intervals and they adapt techniques from stochastic 
approximation to prove the convergence, rate stability, and optimality of the algorithms in [26]. 
Independently, Liu et al. [32] showed that, under stringent technical assumptions, the algorithm 
in [26] converges to an approximate utility maximizing solution. However, their result does 
not establish the throughput optimality (i.e., stability of queue-size in some form). Further, 
implicitly their algorithm requires some knowledge about the entire system. 

The key idea of [26] is that, instead of using the MW schedule, the algorithm attempts to 
improve the schedule to match the arrival rates into the queues. The schedule is parameterized 
by the aggressiveness with which the nodes request the channel, i.e., by a parameter of the 
backoff time in a CSMA algorithm. One then defines a distance between the actual schedule 
and the desired schedule. The gradient of that distance with respect to the aggressiveness of 
one node turns out to be the difference between the average service and arrival rates at that 
node. Since the queue length reflects this rate difference, the adjustment of the aggressiveness 
of one node is based on the queue length at that node and is local. 

The technical problem to prove the convergence and the optimality of the algorithm in [26] 
is as follows. The queue length of one node measures the difference between the actual arrivals 
and services at the node, not between the average values of those quantities, as the algorithm 
needs. The idea is that over a long enough time, the random quantities approach their mean 
values. However, the algorithm changes the parameters (the aggressiveness of the nodes). The 
intuition is that if the parameters remain constant for long enough, then the distribution of 
the underlying Markov chain approaches its invariant distribution. Consequently, the algorithm 
based on the queue lengths approaches the desired gradient algorithm. The general idea of using 
a random version of the desired gradient is at the heart of stochastic approximation (see [SJ [6] 
and [32]). Here, the additional step is to show that the Markov chain approaches its stationary 
distribution fast enough for the mean values of the observed quantities to be close to the desired 
gradient. The needed technical tool is a bound on the mixing time of the Markov chain. Here, 
as in [25] , we use a uniformized version of the continuous time Markov chain to exploit a bound 
available for the mixing time of discrete time Markov chains. 

The current paper provides an alternate proof of the rate stability. Moreover, it proves the 
Harris recurrence of the queue lengths when using a variant of the algorithm that requires that 
each node knows the total number of nodes in the network. Under that assumption, for any 
given e > 0, there is a congestion control algorithm that is e-optimal . The difference between 
the proof in [25j and the current proof of rate stability is as follows. In [25], the error between 
the gradient and its random version is decomposed into a martingale term and a bias. The bias 
is bounded using the mixing time result. For the martingale term, [25] uses the supermartingale 
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convergence theorem. In the current paper, instead of using the supermatingale convergence, 
the error enters in a Taylor expansion. The second term of the Taylor expansion involves a 
Hessian matrix whose entries happen to be correlations that can be bounded, using suitable 
choices of the adjustment steps in the algorithm. The proof of the Harris recurrence involves 
constructing a 'petite set.' The intuitive meaning of this set is a generalization of a recurrent 
state for a countable Markov chain. Here, the state space is not countable but one can find 
a positive recurrent set of states whose probability transitions are lower bounded by a given 
measure. Once the Markov chain hits this set, it starts afresh with at least the given measure, 
thus providing the coupling property that leads to the ergodicity of recurrent chains. 

The paper is organized as follows. Section [2] defines the network model. The main results 
are stated in Section [3l Some preliminaries about Markov chains as well as a relevant (CSMA) 
Markov chain are introduced in Section HI The throughput properties of scheduling algorithms 
are proved in Section [5l Specifically, rate stability and Harris recurrence properties are proved 
in Section 15.11 and Section 15.21 respectively. Section [6] analyzes the congestion control problem. 
Section [7] concludes the paper. 

2 Model and Problem Statement 

Our network graph is a collection of n queues. Time is indexed by t G M4.. Let Qi{t) € 
denote the amount of work in the ith queue at time t and let Q(t) = [Qi{t)]i<i<n- Initially, 
t = and Q(0) = 0, i.e., the system starts empty. Work arrives to each queue either as per an 
exogeneous arrival process or is controlled by each queue as per a certain algorithm. Each queue 
can potentially be serviced at unit rate resulting in the departure of work from it. Throughout 
this paper, we shall assume single-hop network. That is, once work departs from a queue, it 
leaves the network. In this paper, we will not consider multihop network but we believe that 
the results of this paper can be extended without much difficulty. 

The queues are offered service as per the constraint imposed by interference. To define 
this constraint, let G = {V, E) denote the inference graph between queues. Here vertices V = 
{1, . . . ,n} represent the n queues and edges E C V x V represent interfering queues: G E 
iff transmissions of queues i and j interfere with each other. Let M{i) = {j £ V : £ E} 

denote the neighbors of node i. Let cri{t) G {0, 1} denote whether queue i is transmitting at time 
t, with notation that cTj(t) = 1 represents transmission. Let cr{t) = Then, interference 

imposes the constraint that for all t G ]R_|_, 



The resulting queueing dynamics are described as follows. For < s < t and 1 < i < n, 



a{t) G I{G) = {p = [p,] G {0, ir : ft + < 1, V £ E] . 



(1) 




where ^j(s, t) denotes the cumulative arrival to queue i in the time interval (s, t] and denotes 
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the indicator function. Finally, define the cumulative departure process D{t) = [Di(t)], where 

Di{t) = I cJi(r)l|Q^(^)>o} dr. 



We define the capacity region of such a wireless network. The capacity region C C [0, 1]" is the 
convex hull of the feasible scheduling set 1{G), i.e. 

C = < Upp : Qp = 1, and Op > for all p G 

_peX(G) pGX{G) 

The intuition behind this definition of capacity region comes from the fact that any algorithm 
has to choose a schedule from T{G) at each time and hence the time average of the 'service rate' 
induced by any algorithm must belong to C. 

Scheduling Problem. In this setup, we assume that the arrival process at each queue is 
exogeneous. Recall that Ai{s, t) denotes the work that has arrived to queue i in the time interval 
(s,t]; Ai{t) = Ai{f),t) represents cumulative arrival process. We assume that the increments 
in the arrival process over integral times, i.e., Ai(k,k + 1) for k G are independent and 
identically distributed with bounded support. Moreover, we assume that G [0,A'] and 

Pr(^j(l) = 0) > for all i. Note that this setup naturally allows for Ai and Aj to be very 
different processes for i ^ j. Finally, we define Aj = E[74j(l)]. Under our setup the strong law 
of large numbers implies that 

AAt) 

lim — - — = Aj, with probability 1. (2) 

t— >oo t 

Let A = [Ai]. We assume that Ajjiji^ — mini<2<7^ A^ ^ without loss of generalit\l^. In this 
setup, we need a scheduling algorithm that decides a{t) each time instant t G M+. Intuitively, 
we would expect that a good algorithm will keep the queues as small as possible. To make this 
notion formal, first note that if A ^ A, then no algorithm can keep the queues finite, where 

A = |A G M" : A < 7 componentwise, for some 7 G C} . 

Motivated by this observation, we call A strictly admissible if A G A°, where 

A° = |A G : A < 7 componentwise, for some 7 G C} . 

We call a scheduling algorithm rate stable if for any A G A°, the following holds with probability 
1: 



Given ([2]), this is equivalent to 



lim -Di{t) = \i V l<i<n. 

t^oo t 



lim = 0, V l<i<n. 

i^oo t 



^Note that, if Ai = for some i, then algorithm will ignore such queues by setting their access probability to 

0. 
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Rate stability is a weaker notion of throughput optimality or stability of the network. A stronger 
notion requires that for any A G A° the underlying network Markov process is positive recurrent 
or more generally positive Harris recurrent. 

In summary, the problem of scheduling requires designing an algorithm that makes the 
network-wide decisions a{t) G I{G) for all t so that the network is throughput optimal (rate 
stable or positive recurrent). The algorithm should utilize only local information, i.e., 
should be based on the history observed at node i only and the sensing information available at 
node i about which of its neighbors are transmitting at a given time. 

Congestion Control Problem. In this setup, unlike the scheduling problem, we require each 
node or queue to control its arrival or data generation process. Specifically, at each node i, an 
algorithm decides the rate Xi{t) G [0, 1] at each time t. The data is generated at node i as per a 
deterministic process with rate Xi{t) at time t. That is, for any < s < t, 

Ai{s,t) = \i{r)dr. 

Given the arrival or data generation process, the remaining problem is similar to scheduling. 
That is, an algorithm is required to make decisions (T{t) G 1{G) for all t using only local 
information and so as to keep queues small, if possible. Now in order to determine the right rate 
allocation, we assume that all nodes have some utility. Let Ui : [0, 1] — > M be a strictly concave 
and increasing utility function of node i, with Ui{x) representing the value of its utility when it 
is allocated rate x G [0, 1]. Then, ideally we wish nodes to allocate rates A* = [X*] where 

n 

A* = argmax ^J7i(Aj) over A G A. (3) 

i=l 

In summary, the problem of congestion control requires designing an algorithm that makes 
decisions A(t) G [0, 1]" and (T{t) G 1{G) for all t so that A(t) A* and the network of queues 
is stable, i.e. rate stable or more generally positive recurrent. The algorithm should utilize 
only local information, i.e., both \i{t) and ai{t) should be based on the history observed at 
node i only and the sensing information available at node i about which of its neighbors are 
transmitting at a given time. 

3 Main Results 

This section describes our algorithms and theorems stating their performance guarantees for 
scheduling and congestion control. The algorithms presented here are variants of algorithms 
proposed in an earlier work [26]. As noted earlier, this paper provides an alternate proof of the 
rate stability established in [25] and the new result of Harris recurrence. 

3.1 Scheduling Algorithm 

The algorithm to decide a{t) through local decisions (Ti{t) can be classified as a CSMA (carrier 
sense random access) algorithm. The basic operation of each node under such an algorithm can 
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be described as follows. In between two transmissions, a node waits for a random amount of 
time - also known as backoff. Each node can sense the medium perfectly and instantly, i.e., 
knows if any other interferring node is transmitting at a given time instance. If a node that 
finishes waiting senses the medium to be busy, it starts waiting for another random amount of 
time; else, it starts transmitting for a random amount of time. The nodes repeat this operation. 
The difference between all such protocols lies in the selection of the random waiting time and 
random transmission time. 

In this paper, we assume that node i's random waiting time and transmission time have 
exponential distributions with mean 1/Ri and 1, respectively. Therefore, the performance of 
algorithm is solely determined by the parameters Ri,l < i < n. In essence, our scheduling 
algorithm will learn a good value for i?, at each node i using only local information, so that 
the performance of the algorithm is throughput optimal. It is somewhat surprising that such a 
simple class of algorithms can indeed achieve the optimal throughput. 

More precisely, let Ri{t) be the value of parameter Ri at time t. Given that Ri{t) changes 
over time, the waiting time becomes distributed according to an exponential distribution with 
time varying rate. A convenient way to think of this is as follows. Suppose node i starts its new 
waiting period at time ti and is still waiting at time t > ti. Then, given the history till time t, 
the waiting time ends during (t, t + e) with probability Ri{t)e + o(e). 

Given the above description, the scheduling algorithm is completely determined once we 
describe how Ri{t) are decided for all i and all t E For convenience, we describe the 

algorithm for selecting rj(t) = lni?j(t). The algorithm, at each node i, updates at time 
instances L{j),j £ Z+ with L{0) = 0. Also, ri{t) remains the same between times L{j) and 
L{j + 1)) for all j € Z+. To begin with, the algorithm sets rj(0) = for all i. With an abuse 
of notation, from now onwards we denote by the value of for all t G [L{j), L{j + 1)). 
Finally, define T{j) = L{j + 1) - L{j) for j > 0. Note that r(0) = L(l) - L(0) = 

In what follows, we describe two variants that differ in the choice of L{j) and the update 
procedure rj(-). The first variant uses strictly local information while the second variant uses 
information about the number of nodes in the network and a performance parameter e > 0. We 
provide theorems quantifying the performance of these variants as well. 

Scheduling Algorithm 1. In this variant, we use a varying update interval T(j). Specifically, 
we select 



Given this, node i updates rj(-) as follows. Let Aj(j),Si(j) be empirical arrival and service 
observed at queue i in [L{j),L{j + 1)). That is. 




for J > 1. 



Also, we choose a step-size a{j) of the algorithm as 



= for j > 1. 



1 
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Then, the update rj(j + 1) of rj(j) is defined by 

ri{j + 1) = + a{j)(XS) - Uj)), (4) 

with initial condition rj(0) = 0. This update rule is essentially an approximate gradient algo- 
rithm for the optimization problem (j26p defined below. 

Note that, under this update rule, the algorithm at each node i uses only its local history. 
Despite this, we establish that this algorithm is rate stabile. Formally, we obtain the following 
result. 

Theorem 1 The scheduling algorithm with updating rule @ as described above is rate stable 
for any A G A°. 



Scheduling Algorithm 2. In this variant, we use T{j) = T for some fixed T. The choice of 
T will be depend on two quantities - the number n of nodes in the network (we assume n > 3 
here) and e > that characterizes the approximate stability of the system. Specifically, 



T = T{n,e) 



Then the updating rule becomes 

riU + 1) 



exp ( ( — log — 



nU) + a{Xi{j) + e - Si{j)) 



(5) 



(6) 



where a = a{n, e) = e^n ^/72(A' + 1)^ (here, K is the Lipschitz constant for cumulative arrival 
process) and if x = [x] n then 



n 

e 



if Xi > 
if Xi < 



(7) 



■•i otherwise. 

We state the following throughput optimal property of the algorithm using this rule. 

Theorem 2 For given e > 0, under the above described scheduling algorithm the network is 
positive Harris recurrent i/ A + 1 • 2e G A° . 



3.2 Congestion Control Algorithm 

The algorithm for congestion control has to select the appropriate values of rj(-) and the arrival 
rates Aj(-). These decisions have to be taken so that the arrival rates maximize overall network 
utility while keeping the queues small. 

Like in the scheduling problem, the algorithm for congestion control updates its choice of 
ri{t) and \i{t) at time instances L(j), j G Z+ with L{0) = 0. To begin with, it sets rj(0) = and 
Aj(0) = 1 for all i. With an abuse of notation, from now onwards we denote by ri{j) (resp. Xi{j)) 
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the value of ri(t) (resp. Ai(t)) for alH G [L{j), L{j + 1)). As before, define T(j) = L{j + 1)-L{j) 
for j > 0. Note that T(0) = L(l) - L(0) = L(l). 

In what fohows, we describe two algorithms for congestion control. Like the two scheduling 
algorithms, the first variant does not utilize any global information while the second variant 
utilizes information about number of nodes and a performance parameter. 

Congestion Control Algorithm 1. Here, T(j) = exp{^/J), a{j) = j for j G N. The rj(-), Aj(-) 
are updated as follows: for all i, 

ri{j + l) = +a(j)(Ai(j) 

Xiij + 1) = arg max (/3 • Uiiy) - n^j + l)y) , (8) 

ye[o,i] 

with initially r(0) = and A(0) = 1. Here /3 > is an algorithm parameter and it plays a role in 
determining the efficiency of the algorithm. As before, each node updates its parameters based 
only on local information. Recall that, each node i accepts data at rate in L{j + 1)) 

deterministically. We state the following result about the performance of this algorithm. 

Theorem 3 Under the above described algorithm, the queues Q(-) and arrival rates A(-) are 
such that 

lim ^ = 0, and lim A(j) = A, with probability 1, 

t^oo t 



where A is such that 

where recall that A* is a solution to utility maximization problem ([3]). 



Congestion Control Algorithm 2. Here, the step-size T{j) is constant, and equal a large 
value r, for all j. In addition to the above, we assume that Ui{-) are such that 

V = maxf/;(0) < oo, (10) 

i 

and V is known to all nodes. The algorithm performance parameter is e > 0. The step size a 
is a small, fixed constant in (0, 1). Let P = 4n/e. Select T such that 

The updating rule is as follows. For all i, 

n{j + 1) = h (j) - asi (j)] + + aXi {j), 

Xi{j + 1) = arg max ((3 ■ Ui{y) - nU + l)y) , (11) 
yG[o,i] 

with initially r(0) = and A(0) = 1. We state the following result about this algorithm. 
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Theorem 4 Under the above described algorithm 2, the queue lengths Q(-) are such that 



Qi{t) < ^ for all t > 0, for all i. 



a 



Further, define A(J) as 



Then, 



with probability 1. 



(12) 



4 Preliminaries 

This section recall relevant known results about establishing bound on mixing time of Markov 
chains. We will start by setting up basic notations and recalling known definitions. 



4.1 Markov Chain and Mixing Time 

Consider a discrete-time, time-homogeneous Markov chain over a finite state space i}. Let 
an \Q\ X |r2| matrix P be its transition probability matrix. If P is irreducible and aperiodic, 
then the Markov chain has a unique stationary distribution and it is ergodic in the sense that 
limT-_+oo ^^(ji ^) T^i for any i,j E il. Here tt = [tt^] denotes the stationary distribution of 
the Markov chain. The adjoint of the transition matrix P, also called the time-reversal of P, is 
denoted by P* and defined as: for any i,j £ Q, 7v{i)P* {i, j) = 7v{j)P{j,i). By definition, P* has 
TV as its stationary distribution as well, li P = P* then P is called reversible, and in this paper 
we will be primarily interested in such reversible Markov chains. 

As noted earlier, the distribution of the irreducible and aperiodic Markov chain converges to 
its stationary distribution tt starting from any initial condition. To establish our results, we will 
need quantifiable bounds on the time it takes for the Markov chain to reach close to stationary 
distribution - popularly known as mixing time. To make this notion precise and recall known 
bound on mixing time, we start with definition of distance between probability distributions. 

Definition 1 (Two distances) Given two probability distributions n and v on a finite space 
0,, we define the following two distances. The total variation distance, denoted as — vl 



TV 



IS 



TV 



-y. 

'ien 



The distance, denoted as 



IS 



u 

— 1 



2,M 
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We make note of the following relation between the above defined two distances: for any prob- 
ability distributions fi, v, using the Cauchy- Schwartz inequality we have 



2,M 



ten ^ 



\ 



> 



E 



1 



'ten 

In general, for any two vectors u, v G ]r[|^', we define norm 



E 



i^i - Mil 



(13) 



Uivf. 



This norm naturally induces a matrix norm that will be useful in determining rate of convergence 
or mixing time of a finite state Markov chain. 

Definition 2 (Matrix norm) Consider an \ ft\ x \Q\ non-negative valued matrix A G 
and a vector u G . Then, the matrix norm of A with respect to u is defined as follows: 



l^ll 



sup 



12, u 



v:Eu[v]=0 l|V||2,u 

where Eu[v] = Y^i'^i^i- 

It can be easily checked that the above definition of matrix norm satisfies the following properties. 
PI. For matrices A, B e Ir[^I''I"I and tt G m)^^ 

\\A + B\\„ < ||A||^ + ll^ll^. 

P2. For matrix ^ G m'J^''''^', tt G M.f^ and c G M, 

llc^ll^ = \c\\\A\\^. 

P3. Let A and B be transition matrices of reversible Markov chains, i.e. A = A* and B = B* . 
Let both of them have tt as their unique stationary distribution. Then, 

\\AB\\^ < \\A\\„\\B\\^. 

P4. Let A be the transition matrix of a reversible Markov chain, i.e. A = A* . Let tt be its 
stationary distribution. Then, 

m4 -jT — Amax, 



where Amax = max{|A|| A / 1 is an eigenvalue of A}. 
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For a probability matrix P, mostly in this paper we will be interested in the matrix norm of P 
with respect to its stationary distribution tt, i.e. ||P||7r- Therefore, unless stated otherwise if we 
use matrix norm for a probability matrix without mentioning the reference measure, then it is 
with respect to the stationary distribution. That is, in above example ||P|| will mean ||P||7r- 

With these definitions and fact that P and P* have the same stationary distribution, say tt, 
it follows that for any distribution ^ on Q 



(14) 



where we have used (abused) notaiton ||P*|| = ||P*||7r and since ~ l] = 0, with interpre- 

tation ^ = [//(i)/7r(i)]. Therefore, for a reversible Markov chain (P = P*) starting with initial 
distribution /x(0), the distribution /x(r) at time r is such that 



1^-1 


< \\P*\\ 


^-1 




TV 


2,7r 


TT 


2,7r 



M(r) 


- 1 


< ii^ir 






TV 




2,7r 


TV 


2,7r 



(15) 



Now starting from any state i, i.e. probability distribution with unit mass on state i, the initial 



distance 



1 



2,7r 



m(o) 

TT 

Therefore, for any 6 > we have 

log l/vTmin + log 1/6 



in the worst case is bounded above by -y^l/rv^^ where TTmin = minjVTj. 



2.7r 



< 6 for any r such that 



r > 



logl/||P| 



O 



logl/TTmin + logl/^ 



1 



|P| 



This suggests that the "mixing time", i.e. time to reach (close to) stationary distribution of 
the Markov chain scales inversely with 1 — ||P||. Therefore, we will define the "mixing time" of 
a Markov chain with transition matrix P as 1/(1 — ||P||). This also suggests that in order to 
bound the distance between a Markov chain's distribution after some steps and its stationary 
distribution, it is sufficient to obtain a bound on ||P||. 



4.2 CSMA Markov Chain &; Its Mixing Time 

As the backbone of our algorithms, for scheduling and congestion control, is a Markov chain with 
state space being T{G). In recent years, this was considered in the context of CSMA by Wang 
and Kar [56]. Its transition matrix is determined by the vector r(-) and hence is time varying. 
However, if r(-) were fixed, then it will be a time-homogeneous reversible Markov chain. In 
the context of CSMA, the vector of r(-) corresponds to the aggressiveness of backoff. In what 
follows, we will describe this time-homogeneous version (i.e. assuming fixed r(-)) of Markov 
chain, which was implicit in the description of the scheduling/congestion control algorithm, its 
stationary distribution and a bound on its mixing time. 

To this end, let r(-) = r be fixed. Recall that, under scheduling/congestion control algorithm, 
each node does the following. Each node i is either in 'transmission' state (denoted as dj = 1) 
or 'waiting' state (denoted by cjj = 0). In a waiting state, node has an exponential clock ticking 
at rate Ri = exp(ri) (mean l/Ri): when it ticks, if medium is free, it acquires and starts 
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transmitting (i.e. now cjj = 1); else if medium is busy, it continues the waiting state (i.e. retains 
(7j = 0). In a transmission state, node has an exponential clock ticking at rate 1: when it ticks, 
it frees the medium and enters waiting state (i.e. now ai = 0). 

This is a continuous time Markov chain over a finite state space. It can be easily checked 
that it has the following product form stationary distribution tt"" = [vr^] : for any a S 1(0) , 



TT^ oc exp(cr • r) 

exp(cr • r) 



E,TeJ(G) exp(cr • r) ' 



(16) 



Here, for vectors a, b, we use notation of dot-product, a • b = ^^Li o-ibi- Under this stationary 
distribution the average fraction of time node i ends up transmitting, which is its 'service rate', 
is given by 

(tGJ(G) 

Throughout, we will call s(r) = [si(r)] as the service rate vector induced by tt'". To understand 
the 'mixing time' of this continous time Markov chain, first consider its following discrete time 
version with transition matrix P on T{G). Under P, the transition from current state cr S 2r(G) 
to the next state a* G 1{G) happens as follows: 

o Choose a node i £ V with probability max{cxp(T J 1} ^ 
o If (7j = 1 (equivalently, i £ a), then 

!0 with probability min{l/ exp(ri), 1} 
1 otherwise 

and a* = aj for j ^ i. 
o If (Tj = and ak = for all k £ M{i) (i.e. i ^ a and k ^ a for all k £ Af{i)), then 

{1 with probability min{exp(rj), 1} 
otherwise 

and a* = aj for all j ^ i- 
o Otherwise cr* = a. 

The above discrete version of the continuous time Markov chain is reversible, i.e. P = P* . It 
can be checked that P is indeed the discretized version, i.e. tt"" is stationary distribution of P. 

The continuous time Markov chain relates to the above described discrete time Markov 
chain with transition matrix P as follows: think of continuous time Markov chain making its 
transitions when a clock of net rate R = max{exp(rfc), 1} ticks. And, when its clock ticks 

the next state for transition is chosen as per transition matrix P. Given this, let /i(t) be the 
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distribution over under the continuous CSMA Markov chain at time t. Then, the dynamics 
of /i(-) is described as 



5^Pr(C = iH0)P^ 
i=0 
1 RtP 



oRt 



/i(0)e 



Rt{P-I) 



(18) 



where ^ is Poisson random variable with parameter Rt which is equal to the number of clock 
ticks in time [0,t]. In the above (and throughout), in the left multiplication of a vector with a 
matrix, the vector should be thought of as a row vector. 

Given ()18p and earlier discussion on matrix norms, mixing time analysis for discrete time 
Markov chain, we obtain that 





< 


^Rt{P~I) 








TV'- 








TV'- 





Therefore, to bound the distance between fi{t) and vr'", we need to get a bound on ||e^*(^ 
Lemma 5 The matrix norm of e-^*(^~^) is bounded as 

/ 1 \[t\ 
^Rt{P^l) < f 1 ^ ] 

V exp(e (n|r|oo + n))y 
Proof. Define partition function or normalization constant Z{y) of tt"" as 

<TeX(G) 

It follows that 

Z(r) < \T{G)\ exp (n||r||oo) < exp (n(l + ||r||oo)) • 
Therefore, for any cr G T{G), 

1 



> exp (-n(l + 2||r||oo)) • 



(19) 



Now for any cr,p & ^(G) such that they differ in only one component, i.e. it is possible to 
transit from cr to p and vice versa in one-step, we have 



crp 



> Pr{C=l)P„p 

> exp ( — 0(n||r||c 



n)). 



In above we used the fact that C is Poisson random variable with parameter max{exp(rj), 1} 
which is at most n(l + exp(||r||oo))- 
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Given above calculations, we are ready to bound the conductance, ^, oiW = e^^^ ^'^ defined 



as 



mm 



Q{S,I{G)\S) 

SCX(G) 7V^{S)7V^{I{G)\S) 

> min vr^VFo-p 

> exp (— 0(n||r||oo + ?^) , 



(20) 



where Q{A,B) = YlcTeA,peB^''cTWcTp- By Cheeger's inequality [Ml EOl ESl HI] , it is well known 
that 



$2 



< 1 - exp (-e(n|r|max + 



(21) 



where Irl 



Hence, from the properties P3 and P4 of the matrix norm, we can conclude that 

||gra(p-/)|| < ||g/?(p-/)||LfJ 

< A LtJ 

^max 

< (1 -exp(-G(n|rUax + n)))^*^ • 



(22) 
□ 



Using Lemma [5] and the fact that 



m(o) 



2,7r 



< a/ 1 / TTmin < exp(0(n|r|oo + n)), we obtain 



TT' 



<6 for t > exp(G(n||r||oo + ra)) log ■ 



(23) 



2,7r'- 



4.3 Positive Harris Recurrence 

We recall the well known notion of positive Harris recurrence for discrete time Markov chains. 
To this end, consider a time homogeneous discrete time Markov chain over a polish space X 
denoted as X{t) G X for time r G N U {0}. Let Bx be the Borel cr-algebra of X with respect 
to this product topology. Let P denote the probability transition matrix of this discrete-time 
X-valued Markov chain. Given a probability distribution (also called sampling distribution) a 
on N, the a-sampled transition matrix of the Markov chain, denoted by Ka is defined as 

ifa(x, B) = Y^ a(r)P^(x, B), for any x G X, B € Bx- 

r>0 

Now we recall notion of a petite set [39]. A non-empty set A £ Bx is called fia-petite if fia is a 
non-trivial measure on (X,Bx) and a is a probability distribution on N such that for any x £ A, 



i^a(x,-) >^*a(-)- 
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A set is called a petite set if it is ^a-petite for some such non-trivial measure fia- 

We will call the Markov chain positive Harris recurrence if there exists a closed petit set that 
is positive recurrent. This is formally summarized as the following known result (see book by 
Meyn and Tweedie [39] or survey by Foss and Konstantopoulos [H] for details). 

Theorem 6 Let B be a closed petite set. Suppose B is recurrent, i.e. 

Pt:{Tb < oo|X(0) = x) = 1, for any x G X, 

where Tb = infjr > 1 : X{t) € B}. Further, let 

sup Ex [Tb] < CO. 
xe-B 

Then the Markov chain is positive Harris recurrent. 

Theorem [6] suggests that to establish the positive Harris recurrence of the network Markov chain, 
it is sufficient to find a closed petite set that satisfies the conditions of Theorem [6l To establish 
recurrence property of a set, the following Lyapunov and Foster's criteria will be useful. 

Lemma 7 Let there exist functions /i, (7 : X — > M and L : X — > M4. such that for any x G X, 

E [L(X(5(x))) - L(X(0))|X(0) = x] < -/i(x), 

and 

(a) infxex/i(x) > -00, 

(b) lim infi(x)^oo H^) > 0, 

(c) supi(x)<^ c/(x) < 00 for all 7 > 0, 

(d) limsupi(x)^oof(x)//i(x) < 00. 

Then, there exists finite k > so that the set B^ = {x : i^(x) < k}, the following holds: 

lEx \Tb^\ < 00, for any x G X 
sup Ex [TbJ < 00. 

5 Throughput Property of Scheduhng Algorithms 

This section establish throughput optimality for the two scheduling algorithm proposed in Sec- 
tion [3Tj Specifically, we present proof of Theorem [T] to establish rate stability of the scheduling 
algorithm 1 in Section [5. II and proof of Theorem [2] to establish positive Harris recurrence of the 
scheduling algorithm 2 in Section 15.21 As noted earlier, the algorithm 1 does not utilize any 
global information while the algorithm 2 utilizes only global information in terms of number of 
nodes in the network. 
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5.1 Proof of Theorem [H Rate stability 

The proof of Theorem [1] consists of the three parts. First, we introduce and study a relevant 
optimization problem whose parameters are the vector of backoff parameters r(-). On one hand, 
it is related to the classical variational principle studied in the context of Gibbs distributions or 
Markov Random Fields (e.g. Chapter 15.4, Georgii[T5]). On the other hand, it will suggest that 
the optimal solution corresponding to r(-), say r*, will be such that the service rate vector s(r*), 
induced by the Markov chain's stationary distribution, is the same as the arrival rate vector A. 
Therefore, if Algorithm 1 adjusts the r(-) appropriately so that r(-) converges to r*, then there 
is a possibility establishing rate stability. In the second part, we do so by showing that the 
Algorithm 1 is a stochastic gradient algorithm for the optimization problem of interest. Finally, 
in the third part, we conclude the proof of Theorem [1] by establishing that the system is rate 
stable for any A G A°. 

A relevant optimization problem & its properties. We begin by introducting the opti- 
mization problem of interest. Its relation to variational principle will be alluded later. To this, 
given an arrival rate vector A € A° and r G M", define function F{r,X), where 

F(r,A) = A-r-log ^ exp(o- • r) j . (24) 

V<TeX(G) / 

The interpretation of F{r, A) is as follows. Assume that A is strictly feasible, i.e. A S A°, so 
that it can be written as a positive combination of feasible transmission vectors. That is, 

A = ^ i^o-cr 

<tGX(G) 

for u = [vcr] G ]^-f • Therefore, if cr G I{G) is scheduled for Va- fraction of the time then 
effective service rate is the same as arrive rate A. Clearly, u can be thought of as a probability 
distribution on I{G) as well. 

Now consider the Kullback-Liebler divergence or relative entropy between this distribution v 
and tt"", the stationary distribution of CSMA Markov chain with parameters r, defined as follows: 



o-GX(G) 

It is well known that 

lli^- — tt^'IItv < d{iy, tt""). 

However, d{-, ■) is not a metric and it's only pre-metric. Consider the following relation between 
F(r,A) and d(i/,7r"-): 

^(r, A) = ^ i^pp • r - log ^ exp(cr • r) 



pGX(G) V 



exp(p • r) 



E,TeX(G)exp(o--r)^ 
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peX(G) ^ '^^ peX(G) 

= -d{u,7v'')-HER{iy). (25) 

Thus, for a given fixed A, we liave that 

tt"") + F(r, A) = Constant. 

Therefore, minimizing d{i', 7V^) with respect to parameter r is equivalent to maximizing F{r, A). 
And as we shall show, that this optimization of r leads to r* so that the s(r*) equals A as long 
as A G A°. For this reason, the following is the optimization problem of interest. 

maximize F(r,A) 

subject to r G W. (26) 

Now we state the following useful properties of this optimization problem. 
Lemma 8 Consider a given A € M" . Then, the following holds. 

(1) The objective function F(r,X), as a function of r is strictly concave. Moreover, 

^F(r,A) = A,-s,(r) (27) 

and 

= K^r [a^aj] - E^r [ai]E^r [a^]. (28) 

(2) For A G A°, the optimization problem i26\) has a unique solution r*(A) that is attained 
and F(r*(A)) < 0. Let r = r*(A). Then, under tt'" the 'service rate vector' {as defined in 
(|17|) ) s(r) = [si(r)] equals A. That is, 

o-jvr^ = Aj, for all i. 

(3) Further, for any e > such that A + el £ A, 

log|J(G)| 



min{e, Amin}' 

Proof. For simplicity of notation, we will drop the reference to A in -F(r, A) and simply denote 
it as -F(r) as we have A fixed throughout the proof. We will use additional notation of the 
partition function Z{t) of tt'" defined as 

^(r) = X] ('^ • • 
<TeX(G) 
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Proof of (1). We wish to establish that F{t) is strictly concave as a function of r. To this end, 
its first derivative can be calculated as, 

aP _ ^ E<TgJ(G) exp(o- • r) 
d'^i ' Epei{G) exp(p • r) 

= A,-E„r[cT,] = \i-Si{v). (29) 



Here, we have used the definition of tt"" in (jl6p . 

To obtain strict concavity, we would like to show that the Hessian of F is negative definite. 
Now, we compute the second derivative as (using ([29]) ) 



\ <tGX(G) 



IEtt-- [c^ic^j] - Y '^i exp(o- • r)^^ j J] exp(p • r) 




^.'i-.i-l_E j:^;.^ 

= E„r[(jiCTj-]-E^r[a,]E„r[fjj]. (30) 

Thus, the Hessian of F, denoted by M = [Mij] with Mij = gf^-, is the negative covariance 
matrix of a random vector with distribution tt'". It is well known that covariance matrices are 
positive semi-definite, i.e., M is negative semi-definite. For strict concavity of F, we need to show 
that M is negative definite or the covariance matrix of tt'" is positive definite. To this end, let X 
be a vector (of n binary) random variables with the joint distribution tt^ . Let /x = E[X] £ R" 
be the vector of its mean. Then, from the above we have that —M = E[(X — /x)(X — ii)^]. Now 
consider any vector C, G M". To establish the positive definiteness of — M, we need to show that 

C^(-M)C > 4^ C / 0. 

Suppose to the contrary that there exists a vector Q ^ such that (^'^{—M)C, = 0. Clearly, 

C^(-M)C = C'^E[(X-^)(X-^)^]C = E[C^(X-/x)(X-/x)^C] > 0. 

Therefore, let us assume that 

C^(-M)C = E[C'^(X - /i)(X - /x)'^C] = 0. 

That is, the random variable ^"^(X — /i) = with probability 1 with respect to tt"". Now consider 
n vectors ei, . . . ,e„, where in Cj only node i is selected; i.e., G {0, 1}" with ith component 
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1 and all other components 0. Now, by definition 7v^{ei) > for any r. Therefore, the above 
condition imphes that for all i, C'^{ei — /x) = 0. That is, for all i, 



Ci{l - ^i^^ " ^ ^ " X] ^J^J- ^^^^ 

That is, for all z, Q = c. Now applying the same argument with the choice of cr = 0, we obtain 
that 

cl^/x = 0. 

This immediately implies that c = since fi^l > for any r. Thus, we have proved that if 
C'^(-M)C < then it must be that C = 0. That is, M is negative definite and hence F is strictly 
concave. This completes the proof of (1) of Lemma [H 

Proof of (2) and (3). We wish to establish that for A G A°, the optimization problem has a 
unique solution that is attained. We will establish this by showing that the optimal solution 
must lie inside a closed, bounded and convex set since A G A°. As a byproduct, this will provide 
(3). Then, the strict concavity of F will immeditely lead to the existence of a unique solution, 
and the claim that A = s(r*(A)) as a result of the local optimality condition. As the first step 
towards this, we establish that F(r*(A)) < 0. 

To this end, since A G A°, it can be easily check that there exists a distribution u on F{G) 
such that 

A = ^ y^a. (32) 

(tGX{G) 

Therefore, using (j32p in the definition of F, we have 

^(r) = X] ^pP • - log X] ^^P^*^ ■ 

VpGJ(G) / V<TGX{G) 

^ Up log exp (p • r) 1 - log ^ exp(cr • r) 



ST 1 ( exp(p-r) \ 
.gTTg) V^-eX(G)exp(^-r); 



< 0, (33) 

The last step follows because (i) for any p G I{G), exp(p • r) < Z{r) since any graph has at 
least two independent sets; (ii) for some p G 1(0), Up > 0. 
Next, we will show that if A + el G A, then 

sup F(r) = sup F(r), where K = _ (34) 

r6K" r&[-K,K\" mmje, Aminj 
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To establish (|34p . we wih show that for any r G M", if (a) rmax := niaxi<j<.„ rj > K > 
log|T(G)|/eor (b) rmin := mini<i<„ri < -/sT then F(r) < F(0) = -log|J(G)|. As a byproduct, 
this will imply (3) of Lemma [HI 

First for case (a), consider a given r so that r^ax > log |X(G)|/e. Since \ + e ■ 1 £ A and 
a £ {0, 1}", there exists a non-negative valued measure u on such that 

A = fo-cr, and ?^<t < 1 — (35) 

This implies the existence of a distribution V on I{G) defined as 

i^cT + f 1 - EpeX(G) ^p) if cr = 

otherwise. 

Note that A = Yl<Tei{G) ^o-f^- Therefore, 

F(r) = A • r — log exp((T • r) 

\o-eX(G) 

exp(p • r) 



= V i^plog 



E^eX{G) exp(cr • r) 

< elog ^ 

exp(rmax) 

< -log|X(G)| 

= m 

< supF(r). (36) 



Now, we prove case (b). For this, let r be such that r^i^i < — log |X(G)|/ min{e, Amin}. Let 
i be such that = rmin. Define A as Aj = Aj — min{e, Aj} and Xj = Xj for j ^ i. Clearly, 
A + min{e, Aj} • 1 G A°. Therefore, similar to (j35p . there exists non-negative valued measure 
on T{G) so that 

A = v]t^-, and z^^ < 1 — min{e, A^}. (37) 

creJ(G) creT{G) □ 

Now define a distribution i/' on I{G) such that 



+ min{e, Aj} if a 



e 
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^'-^ = < i^i + (l - J2pei(G) ^p) - min{e, AJ, if cr = 0, 



i/^ otherwise. 
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Here, as before, Cj refers to the independent set with only node i transmitting. Note that 
A = J2crei{G) ^'ct'^- Now, combined with the fact that < — log |X(G)|/ min{e, Amin}, we have 

F(r) = A • r — log exp(<T • r) 

\creX(G) 

/ 1 exp(p • r) 
= V '^plog^^ 

exp(ej • r) 



< v'e, log 



E^GX(G) exp(cr • r) 
exp(ri) 



< min{e, Amm} log 

exp((Jj 

< -log|T(G)| 

= m 

< supi<'(r). 

r 

This completes the proof of (b), and subsequently that of Lemma [8l 

Convergence of r(j) to r*(A). The statement of Lemma [8] suggests that if indeed we have 
algorithm parameter r(j) = r*(A), then we have a desirable situation where the effective service 
rate equals the arrival rate for all nodes as long as A G A°. To this end, we establish that 
indeed r(j) converges to r*(A) with probability 1. And this is because update (H]) of scheduling 
algorithm 1 is essentially step of an approximate gradient algorithm for solving optimization 
problem (j26p . This is made precise in the proof of the following Lemma. 



Lemma 9 If X £ A°, then under scheduling algorithm 1, 

lim r{j) = r*(A), component-wise, with probability 1. 



Proof. First note that, the solution r*(A) of concave (maximization) optimization problem (|26p 
can be found iteratively using the gradient algorithm with appropriate step size. The objective 
is F(r, A) - we will drop reference to A since it is fixed in what follows and use -F(r) instead for 
F(r,X). Now the ith component of gradient vector of F(r), V-F(r) is 

dF 

^(r) = \-..(r). 

For a given i, as per ([U the rj(-) is updated as 

TiU + l) = ri{j)+aij)(Xi{j)-7i{j)') 

= + - {X, - s^{r{j)) + e{j)) 

J 

= ri{j) + -(^{r{j))+e{j) 
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where e(j) = {Xi{j) — — (Aj — Sj(r(j))) captures the 'approximation' error in estimating the 
actual gradient direction given by Aj — Si{r{j)). Thus, if e{j) = then the update of r{j) is as 
per the standard gradient algorithm with step size a{j) = Then standard arguments from 
optimization theory would imply that r(j) r*(A). But, e(j) is a random vector. Therefore, in 
order to establish the convergence, we will show that norm of e(j) is sufficiently small enough. 
Specifically, we establish the following. 



Lemma 10 The following bound holds: 



ENIe 



i] < E 



|A(i)-A||i + ||s(j)-s(r(i))||i 



o(4 



(38) 



where constant in O-term in the error may depend on n. 



The proof of Lemma [10] is stated in Section [5. 1.11 Now using the bound of ()38p we will establish 
the convergence of r(j) — > r*(A). To this end, consider evolution of F(r(-)). By Taylor's 
expansion (with notation 6{j) = VF{r{j)) + e{j)), 

F(r(j + 1)) = F(^r(j) + i[VF(r(i))+e(j)] 

= F(r(j)) + VF(r(i)) • ^^(j) + l^S{tfMd{j) 

> F(r(j)) + -||VF(r(j))||i + -VF(r(j)) • e{j) + l^S{tfM6{j) 

sum 



> F(r(,)) + i||VF(r(,))||i-^('^(^'»"-"^^^')"^ 



IMI 



• (39) 

J- ■ J 2p ^ ^ 

Here M is an n x n matrix as per Taylor's expansion is evaluation of 2nd order partial derivates 
of F at some values. Therefore any element of M, say Mab with 1 < a, 6 < n, is bounded as 
(using calculations executed in ([30]) ) 

\Mab\ < sup 



dradrb 

sup |E^r [(T„]E„r [cTfc] - E^r [(TaCTfe] | 



< 1. 



(40) 



We also note each component of vectors V-F(r(j)) and e{j) are bounded by a constant since 
the cumulative arrival process is Lipschitz and service process is bounded above by unit rate. 
Specifically, for any j 

||VF(r(j))|U <2, ||e(i)||oo <K + 1 \\d{j)\\^ < K + :i. (41) 

Taking expection on both sides of (j39p and using (|40|) . (I4ip and Lemma [101 for all j > C 



1, 



E[F(r(i + l))] > E[F(r(i))] + -E[||VF(r(j))||2]-0 E 



O 



n 



> E[F(r(j))] + -E ||VF(r(i))||^ -O -o 



n 



(42) 
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Performing summation of (j42p from j = C to cxo, we obtain 



CXJ ^ 

j;-E[||VF(r(j))||2] < 0{n')-E[F{r{C))]+F{r*{X)) 

< oo, 



(43) 



since F(r*(A)) < from Lemma [8] and by definition of the algorithm and F{-), E [Fi(r(C))] > 
— oo. Now since Yl'jLc ^/^ ~ conclude from ([^3]) that 



= lim.infE[||VF(r(j))||2] 



> E 



by Fatou's Lemma. 



(44) 



hminf ||VF(r(j))i 

Therefore, using property of concave maximization we have that with probability 1, 

liminf ||VF(r(i))||2 = ^ liminf ||r(j) - r*(A)||2 = 0. (45) 
j j 

Thus, in order to complete the proof of Lemma O it is enough to show that ||r(j) — r*(A)||2 
converges with probability 1. To this end, consider (with notation r* = r*(A), S{j) = VF(r(j)) + 
eU)) 



|r(j + l)-r*||2 



(a) 

< 



ib) 
< 



< 



< 



(r(i)-r*) + i(VF(r(i))+e(i))||i 
J 



r(j)-r1li + 



|VF(r(j))+e(j)||i , 25(i) • (r(j) - r* 



+ 



r(j)-r*||2 + 0(^) + 



l\ , 2VF(r(j))-(r(j)-r*)+2e(j)-(r(j)-r* 



r(j)-r1|^ + 0(^) + 



r{j)-r*\\l + 
r(j)-r*||2 + 



1_\ , 2e(i) • (r(i) - r*) 

1 



J 



+ 



+ 



(logi + |r*U)||e(i)|h 



J 



logj 
j 



In above, (a) follows from ([1T|) . (b) follows from the concavity of F, i.e. V-F(r(j))-(r(j) — r*) < 0, 
(c) follows from property of update rule that |r(j)|oo = O(logj) and (d) from Lemma [8] that 
||r*||oo = 0{1). An application of Lemma [TOl we have that 



1 < oo. 



Since the terms in above are non-negative, by an application of Fubini's theorem and Markov's 
inequality, we have that with probability 1 

oo 

^||e(j)||i < oo. 
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Of course, 1 /j^ is finite. Using this, we have that 

||r(i + f)-r*||2 < ||r(j) -r1li+7i, 

where ^jjj < oo with probabihty 1. Now the following (standard) fact from analysis (proof 
is omitted) implies that \\r{j) — r*\\ convergence with probability 1 and completes the proof of 
Lemma [9l 

Proposition 11 Consider two real valued sequences x^^Vk^ € N such that for each k, 

oo 

Xk+i <Xk + Vk, and ^ < oo. 

k=i 

Then, lim^ Xk exists. 

Wrapping up: establishing rate stability. As an implicatin of Lemma [9l we establish the 
rate stability of the queueing network. The following Lemma implies Theorem [TJ 

Lemma 12 Given A S A°, under scheduling algorithm 1, 

lim = 0, for all l<i<n. 

i— >oo t 

Proof. Given A G A°, recall that r*(A) is the unique optimal solution of optimization problem 
(j26p as per Lemma [HI In the remainder of the proof, since A is fixed, we will use notation 
F[r) = F(r,A), and r* = r*(A) as before. Now by Lemma [9l we have r[j) — > r*(A) with 
probability 1 as j — > oo. Now as noted earlier, V-F(r) = A — s(r). It can be easily checked that 
s(r) is continuous as function of r. Therefore with probability 1, 

lim VF(r(i)) = VF(r*) 

j^oo 

= 0, (46) 
where the equality to 0, the vector of all Os, is implied by Lemma El Thus, effectively 

lim s(r(j)) = A. (47) 

j-»oo 

Lemma [TU] implies that with probability 1, 

oo oo 

^||A(j)-A||i + ^||s(j)-s(r(j))||i < oo. (48) 
j=C j=C 

That is, with probability 1, 

lim = A and lim ||s(j) - s(r(j))|| = 0. (49) 

J— >oo J— >oo 

From (HZ]) and ([M]), with probability 1, 



lim ||A(i)-s(i)|| = 0. (50) 

J^OO 
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Now consider a node i and any time t. Let t G [L{j), L{j + 1)] = [L{j), L{j) + T(j)] for some j. 
We will bound Qi{t)/t next. To begin with, note that 



j-2 



Ai{Q,t) = Y.T{k)\{k)+A,{L{j-l),t). 



k=0 



Note that the service provide to the ith. node in interval [L{k), L{k + 1)] is T{k)'si{k). Now, for 
the purpose of upper bounding queue, we will assume that this service can be used only to serve 
the work that has arrived in interval [L{k — l),L{k)]. Given this, we obtain the following upper 
bound (using Qi{0) = 0): 



A{0,t) 



< 



[T{k)Xi{k) - T{k + l)si{k + i; 



,fc=o 



+ AiiLij-l),t). 



Here, we have used definition [x]. 



x+\x\ 



, the non-negative part of x, for any j; S M. Since 



t £ [L{j), L{j + 1)] and the cumulative arrival process is Lipschitz, we have 

Ai{L{j-l),t) < A{L{j-l),L{j + l)) 

< K{L{j + l)-L{j-l)) 
= K{T{j-l)+T{j)). 

And, by definition T{k) < T{k + 1). Therefore, putting these together we obtain 



Qijt) ^ 1 



L{j) 



,fc=0 



Y,{T{k)Mk)-T{k)Mk + r 



+ 



K{T{j-l)+T{j)) 



(51) 



Consider the first term on the RHS of dSI]). From (01]) and it follows that \i{k)-Si{k+l) 
as /c ^ oo. And, L{j) > '^'l^T{k) as well as L{j) — > oo. Therefore, it easily follows 
that as j — > oo, the first term goes to 0. Now, the second term on the RHS of (jSip . Since 
T(j) = exp(i/J), T{j)/L{j) — > as j oo. In summary, from this discussion and (j5T]) we 
obtain that for any i, with probability 1 



t— >oo t 



This complete the proof of Lemma [T2j 



□ 



5.1.1 Proof of Lemma 1101 

Note that, as per the update (jH) of scheduling algorithm 1, the r(j) is such that 



|r(j)|oo < = Oilogj). 



k=l 
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Therefore, the statement of Lemma [TO] follows by establishing existence of C so that for j > C, 



E 



|A(i)-A||i + ||s(i)-s(r(i))||i|r(i) 



O 



1 



(52) 



for |r(j)|oo = O(logj). In the remaining proof, for simplicity of notation we will drop reference 
r(j) and simply use E[-] in place of E[-|r(j)]. We will establish that by arguing separately that 



E 



|A(i) 



0(1/ f) and E[||s(j) -s(r(i))||i] = 0(1/ f). 



First, we consider the deviation in A(j). This will immediately follow from the property of 
arrival process. By definition X{j) is the empirical arrival rate vector over [L(j), L(j + 1)). Now 
for any i, 



Ai(i) 



1 



ML{j),L{j + l)) 



1 /^^^^ 



^A,{L{j)+k-l,L{j) + k) 

\k=l 



(53) 



Now, Xk = Ai{L(j) + k — l,L{j) + k) are i.i.d. random variables with E[Xfc] = Aj, bounded 
support [0, K] and hence standard deviation at most K. Using this, we have 



E 



|Ai(i) - Ail 



< 



< 



< 




Y,iXk-nxk]) 

k=l 



\k=i 

1/2 



1/2 



(54) 



where the last inequality follows from T{j) = exp(A/J). This completes the proof of bound on 
deviation for A(j). 

Now, we consider deviations in s(j) compared to s(r(j)). For this, first we establish E[s(j)] 
being close to s(r(j)) and then we establish s(j) being close to E[s(j)]. Therefore, we start by 
evaluating deviation between E[s(j)] and s(r(j)). To this end, consider any i. We will establish 
that, 



nUj)]-s.{r{j))\ 



(55) 
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To establish (j55p . we will use the mixing time bounds (j23p derived in Section [4.21 next. To this 
end, let fi{t) be the distribution over I{G) of scheduling decisions at time t E [L{j),L{j + 1)). 
By Lemma [8]^2), Sj(r(j)) = E^r{j)[(Tj]. And is — 1 valued random variable. Therefore, 



|lE;x(t)ki] - Si(r(j))| 



< 
< 



TV 



(56) 



where the last inequality follows from (1131) . Now, from (I23j) . the RHS of (j56p is bounded above 
by 0(l/j^) as long as 



t > L{j) + (exp(e(n|r|max + "-)) log/) 
= L{j)+f(^hogj = L{j)+Tij), 



(57) 



where T{j) = log j. In above, while applying (j23]) . we have used the fact |r|oo = 0(log j). 
This leads to the following bound. 



|E[s.(j)]-s*(r(i))l 



1 



< 



m 



+ 



dt 



;4 



(58) 



Hence the (|55p follows since 

(jOW log j)/T(i) = 0{l/f) due to choice of T{j) = exp(v(7). 
Given ([55]) . as the last step to establish E[||s(j) — s(r(j))||i] = 0(1//), we will show that 
for any i. 



nMj)-nHm] 

Consider (with notation 5 = [L{j),L{j + 1))), 



O 



(59) 



njfn\uj)-m{3)w 



E 



< E 



/■i-o+i) r r 

/ o-i(t)(it-E / 

JL(j) \_Jl 

( rLU+^) I r 

/ a^{t) dt-E 

\ JL(j) Jl 



E 



™L{i+l) 
"L(j+1) 

"L(j+1) 
'L{j) 



iTj(t) (it 
(7j(t) dt 









) 


- E 


/ (Ji{t) dt 









L{j) 



E[a,(t)] 



E[aiis)\aiit) = 1] -E[a,(s)] ds dt 
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™L(j+l) / rL(j+l) \ ^ 

2( ^^^ ^ E[a,{t)]lJ^ E[a,{s)\a,{t) = l]-E[a,{s)]ds\ dt 

-L(j+1) / rt+r{j) 

2( / lEh(t)] / K[ai{s)\ai{t) = l]-E[aiis)]ds 

'L(j) \ Jt 

+ / E[aiis)\ai{t) = 1] - E[ai{s)] ds] dt] 

Jt+T(i) / / 



(a) / riO'+l) / i-t+T{j) rL(j+l) / 1 

< 2 / E[ai{t)] / 1 ds+ / O ^ ) ds ) 

V Jlo) \Jt Jt+T{j) \J 



< 2T{j){T{j) + 



m 

In above, (a) follows from choice of T{j) as in (|56p . ()57p . if s > t + T{j) then due to the 'mixing 
effect' E[(Tj(s)|(Tj(t)], E[(7j(s)] are within 0(l/j^) of Sj(r(j)). Now, ([60]) immediately implies that 

E[|s,(i)-E[?,(j)]|] = (61) 

To conclude, observe that ([54l) . ([55]) and (f6T|l imply the result of Lemma [TOl 

5.2 Proof of Theorem m Positive Harris Recurrence 

The goal of this section is to prove Theorem [21 that is, the positive Harris recurrence of the 
network Markov process under Scheduling Algorithm 2. For a countable Markov chain, positive 
recurrence means that all states are visited infinitely often, with a finite mean inter-visit time. 
When the state space is not countable (as in our case), one cannot expect every state to be 
visited infinitely often. However, a small set of states can have that property. If the transition 
probabilities out of that set are similar, then the set plays the role of a recurrent state. Indeed, 
the evolution essentially starts afresh once the chain hits that set. This idea is made precise 
by the definition of a petitfi set. Section 14.31 has review of known results about establishing 
positive Harris recurrence. In particular. Theorem [6] there states that the existence of a positive 
recurrent closed petite set implies positive Harris recurrence. 

The appropriate petite set is the set S where the sum of the squares of the queue lengths is 
less than some constant k. The positive recurrence is proved using the fact that the sum of the 
squares of the queue lengths is a Lyapunov function which tends to decrease when it is larger 
than K (Lemma I13p. Intuitively, this is true because Scheduling Algorithm 2 tries to balance 
Sj(j) and Aj(j) + e for all i, so that on average, the service rate dominates the arrival rate on each 
queue. The set S is shown to be petite (Lemma I14p by proving that starting from any state in 



Recall that petite means small in French. 
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that set, there is some lower bound 9 on the probabihty that, at some later time T^, the queues 
become empty, no link is active, and the parameters r of the CSMA backoff delays reach their 
maximum value (Proposition I17p . Thus, the evolution of the Markov chain essentially starts 
afresh from that set with at least probability 0. 

To this end, we start with necessary definitions of the network Markov process under schedul- 
ing algorithm 2. Let r G N U {0} be the index for the discrete time. It can be checked that 
the tuple X{j) = {Q{Tj), r(Tj), a{Tj)) forms the state of the time-homogeneous Markov chain 
operating under the algorithm. Now X{t) G X where X = M" x [— |, x T{G). Clearly, X is 
a Polish space endowed with the natural product topology. Let Bx be the Borel a-algebra of X 
with respect to this product topology. Finally, for x = (Q,r, cj) S X, define norm of x denoted 
by |x| as 

|x| = IQI + |r| + |(t|, 

where |Q|,|r| and |S^| denotes the li norm, \a\ is its index in {0, . . . , |X(G)| — 1}, assigned 
arbitrarily. Thus, |r|, are always bounded. Therefore, in essence |x| ^ oo iff |Q| oo. 

To establish statement of Theorem [2l we need to show that X{t) is indeed positive Harris 
recurrent as long as A + 2el G A. By Theorem [6l it is sufficient to find positive recurrent closed 
petit set. First, we will find closed recurrent set using criterion of Lemma [7] and then establish 
that the set is indeed petit. To this end, define a Lyapunov function L : X — > as 



L(x) = ^ = • 1, where x = (Q, r, a) £ X. 



N = N{e,n) 



^ -2 

i=l 

We establish the following 'drift' property about L. 
Lemma 13 Given X so that X + 2el G A, define 

"48 X 16 X 72n^' 

Then, for any initial state X{0) = (Q(0), r(0), <t(0)) G X, 

E[L(X(iV))-L(X(0))|X(0)] < -HXiO)), (62) 

where h : X is defined as 

/i(x) = eTN{Q{0)-l)-7i{TNf{e + K^ + 2K). (63) 

Therefore, Lemma [7] implies that for some finite k > 0, set = {x : L{x.) < k} satisfies 

Ex [Tbk] < c>o, for any x G X 
sup Ex [TbJ < oo. 

Therefore, by Theorem [6] the following is sufficient to complete the proof of Theorem [2j 
Lemma 14 Consider any k > 0. Then, the set = {x : L['k) < k} is a closed petite set. 
In the remainder of this sub-section, we shall prove Lemmas [13] and [T4l 
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5.2.1 Proof of Lemma 1131 



A relevant optimization problem. The basic idea behind the update algorithm ([6]) is to 
design a simple gradient procedure for solving the following optimization problem. 

maximize F(r, A + el) = F^{r) 

subject to r G M". (64) 

By Lemma El it follows that if A + 2el G A, then (|64p has a unique solution that is attained; 
let it be r* = r*(A + el). Then, from Lemma [8^2) the effective service rate s(r*), under the 
random access algorithm with fixed r*, is such that 

Si{r*) = Xi + e. 

That is, the arrival rate is less than the service rate by e > under this idealized setup. In order 
to establish the positive Harris recurrence, we will need more than this - service rate should 
dominate arrival rate for small enough time interval to imply appropriate drift condition desired 
by Lyapunov-Foster's criteria. This is exactly what we will establish next. 

Derivative of becomes small. As per statement of Lemma [T3| let initial state be X(0) = 
(Q(0), r(0), <t(0)). As the first step, we wish to establish the following: 

TV TV 

-J2^[\\VFMml] = j;^^E[\\X + e-l-s{r{j))\q] 

< — . (65) 
- 16 ^ ^ 

In the above and everywhere else in the proof of Lemmall31 the expectation is always assumed to 
be conditioned on the initial state A(0). For simplicity we will drop reference to this condition- 
ing. Intuitively, ()65p implies that on average and in expectation, the arriving rate A is strictly 
less than the normalized service rate s(r(j)) after a finite time N. This will allow us to establish 
drift in Lyapunov function. To this end, we start with definition G(r) = F^(r) — ||r — r*||2. We 
establish the follwing useful non-decreasing property of G(-) under the 'projection' defined in 

©• 

Lemma 15 For any r G [-f , f]" and Ar G [-1, 1]", < G{r) < and G([r + Ar]^) > 

G(r + Ar). 

Proof. G{r) is upper bounded by since -Fe(r) < F^{r*) < by Lemma El Further, 
G(r) = F,(r)- ||r-r*||2 

> (A + e • 1) • r — log 2_, exp((T • r) 



> n- (-j) -log(2"exp(nrmax)) -n 

> (66) 
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Here (a) follows from Lemma [8^3) for r* = r*(A + el) (thus r* G [-|, j]^), and the last step 
has used n > 3. Now if we set x = r + Ar, |x|max !^ f + 1 and we need to show G([x]s.) > G(x). 
Note that it is enough to show that for any i £ V, 



G([x]n,,)>G(x), 
where the i-projection x = [xjn j is defined as 



(67) 



|x|n I if j = i 

otherwise 

Then we can iteratively apply (j67p to complete the proof. When Xi S [— fif], desired claim 
trivially follows as [x]s, j = x. Now suppose Xi ^ ["fif]- By definition, it must be that 
e (f , f + 1] or Xi G [-f - 1, -f ). We prove ([67]) when Xi G [f , f + 1]; the other arguments 
for the other case are very similar. Consider, 

G([x]^,)-G(x) = F,([x]^,)-F,(x)-(^-r*)V(x,-r*)2 



(a) 

> - 

> - 

> 0. 



+ X 



Xi + 



n 



2r* 



- + 2 Xi - - 



n 



dr-i 



< 1 and |r* 



< 



n-log(2) 



In above, (a) and (b) is due to 
assumption), respectively. This completes the proof of Lemma [T5l 

Now consider the relation between G{v{j + 1)) and G(r(j)). 



< J — 1 (since n > 3 by 

□ 



G(r(i + 1)) 



G [r(i)+a(VF,(r(i)) + e(j))]zi 



> G(r(i)+a(VF,(r(i))+e(i))) 



*l|2 



Fe (r(j) + a (VFMj)) + - \\r{j) + a (VF,(r(j)) + e(j)) - r 

FMj)) + VFe(r(j)) • a (VF,(r(i)) + e{j)) 

(VFMj)) + e{j)) -M-a (VF,(r(j)) + e(j)) 
-||r(j) - r*\\l - 2aVFe{v{j)) ■ (r(i) - r*) - 2a ■ e(i) • (r(j) - r*) 
-a^\\VFMj))+e{j)\\l 



> F,(r(i)) +a||VF,(r(j))||2 + aVF,(r(j)) • e{j) 



r(j) - r*\\'i - 2a ■ e{j) ■ {r{j) - r*) - a\K + ifn" 



> F,(r(i)) +a||VF,(r(j))||2 -a||e(j)||i 



2,a\K + lfn'^ 



„* l|2 



2a\\e 



2n 
1 X — 

e 
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= G(r(j)) + aWVFMMll " « + " ) ^^"^ 

> G(r(j)) + a||VF,(r(j))||2-^||e(j)||i ^^-^ , (68) 

where the random vector e{j) = Xi{j) — Sj(j) — (Aj — Sj(r(j))); M is the n x n with M^;, = 
d'^ Fi;{r) / dradri, for some r in neighborhood of r with Mab G [— 1> 1] by ([10|) . In above (a) follows 
from the fact that a < {K + ly^, VF<,(r(j)) G [-1, 1]", e(j) G and Lemma[l4|. For 

(b), we use that |M|oo < 1 and the concavity of and VFe{r{j)) + e(j) G [-2, (/C + 1)]". 
Finally (c) follows from VF^{r{j)) G [-1, 1]" and r{j) - r* G [-^, ^]". 

Our choice of the large updating period T is merely for bounding e{j) and we obtain the 
following lemma which is analogous to Lemma [TOl 



Lemma 16 // the updating period T > exp 

e log f K then for all j G N 



|A(j)-A||i + ||s(j)-s(rO-))||i 



< 



240n' 



£3 



E 

Therefore, for all j G N 

E[|K.)IKI<,,„„ 

Proof. We provide sketch proof here since the proof of Lemma [TBI is essentially the same as that 
of Lemma[10]- replace T{j) = T, a{j) = a for all j and use |r|max < f to obtain bound of 

on mixing time of the Markov chain on I{G) using As a consequence, it follows that by 
choice of T with large enough constant in its exponent, as stated in Lemma [TBI the expectation 
of ||e(j)||i can be made smaller than any given constant. Specifically, it can be made smaller 
than □ 

Summing (|68|) from j = 1 to A^, 
> G(r(A + l)) 

> GWD) + . ||VF.(r(j))||A - (f: 11.0)11. j - ='°'<^; ')'"\ . (69) 
Taking expectation on both sides and diving by aN, 

^j:E[\\VFMm\l] < -^Giril)) + ^^E[\\emi] + '-^^^^^ 

i'^) 1 I6n^ ^2 

- aiV^2~ + 48 + 48 

< — , (70) 

- 16 ^ ^ 



■^This is the main reason why we consider G instead of as we can not estabUsh monotonicity of under 
the projection. 
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since 



48xl6n^^ 




"48xl6x72n5' 









a = 72(i^+i)2„2 and Lemmas [15] and [T6l 



Service rate dominates arrival rate. Next, we wish to establish that the average of empirical 
service rate dominates the average arrival rate over time interval of length N . That is, for all i 



1 ||5^(E[?,(j)])j > \ + e/2. 



(71) 



To this end, first note that 



1 ^ 



(r(j)) 



< 



< 



OF, 



dri 



\ 3=1 



dF, 



dr.i 



r(j)) 



< 



(from Cauchy-Schwarz inequality) 
7> (72) 



where the last inequality is from ()70p . Therefore, 

N 



1 

- \Yins^{j)]-^^ 



> 



1 

N 
1 

N 



(a) 1 
> — 
- N 



> 



Y^ ((E[.,(r(i))] - A.) + {mm - E[s.(r(j)))]) 

AT 

((E[.,(r(j))] - A,) - mm - nsMMi) 

N 



dF, 



dr.: 



£3 \ 



240n 



1 ^ 

IVe 



r(j)) 



(b) 
> 



(73) 



In above, (a) follows from Lemma [8)[2), i.e. 



BF 

.,(r(i)) = A, + e-^(r(j)), 



and from Lemma [T6j The (b) follows from (j72p . 

Wrapping up: Negative drift. Now, consider Qi{N). For this, suppose Qi(0) > TN . Then, 
Qi{-) is strictly positive over interval [0, TiV] as service rate is at most 1. Therefore, in that case 
the queue Qi{-) is fully served in time [0,TA^]. Hence, using ([73]) . we conclude that 

N 



nQiiTN)] = Q,(0)+r |^J^E[A,-s,(j)] 

< Q.(o)-|r.iv, 



(74) 
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if Qi{0) > TN. In above, as usual we have assumed that the expectation is conditional with 
respect to -'^(O). In what follows, we will use this conditioning explicitly. Given (j74p . we have 



nQ^iTN) - QK0)\Xm = E[(Q,(TiV) - Qi(0)) (QiiTN) + Qi(0)) |X(0)] 

= E[(Q,(TiV) - Qi{0)f + 2Q,(0) {Qi{TN) - Q,(0)) \X{0)] 

< {KTNf + 2Q^{0)E[Qi(TN) - Qi{0)\X{0)] 
(g ({KTN)^-^TNx2Qi{0) ifQi(0)>riV 

^2ir)(riv)2 ifQi(o)<riv 

< -eTNQiiO) + e(TiV)2 + {K^ + 2K){TNf, (76) 
for all Q(0). In above, (a) is from boundedness of arrival process and (b) is from (|74p . Hence, 



E[L{X{N)) - L{X{0))\X{0)] = E 



Y^Qj{TN)-Y,Ql{0) X{0) 
.1=1 1=1 

< -eTN Qi(0)^ + £n{TNf + n(i^2 ^ 2K){TNf. 

This completes the proof of Lemma [T3l 
5.3 Proof of Lemma 1141 

We wish to establish that set -B^ = {x : L(x) < k} is a closed petit set. By definition, it is 
closed. To establish that it is a petit set, we need to find a non-trivial measure on (X, Bx) and 
a sampling distribution a on N so that for any x G 

i^a(x,-) >/^(-)- 

To construct such a measure /.i, we shall use the following Proposition. 

Proposition 17 Let the network Markov chain X[-) start with state x G at time 0, X(0) = 
X. Then, there exists T^>1 and 7^ > stic/i that 

^Prx(X(r) =y) >7«, Vx G 5,. 

T=l 

Here y = (0, [j],0) G X denote the state where all components of Q and a (i.e. the schedule is 
the empty independent set) and ri = j for all i £ V . 

Proof. Consider any x G B^. By definition total amount of work in each queue is no more 
than y/K + 1. Consider some large enough (soon to be determined) T^. By the property of the 
assumed arrival process, there is a positive probability 6^ > of no arrivals happening to the 
system in time T^. Assuming no arrivals happen, we will show that in large enough time tj., 
with probability O}. > each queue receives at least ^/K, + 1 amount of service; and after that in 
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additional time with positive probability 0^ > the empty set schedule is reached. Now, after 
the empty set schedule is reached, in additional time with positive probability 9^ > 0, the 
empty set schedule remains; i.e. the scheduling does not change in this time. Since the empty 
set schedule remains and no packet arrives, is increasing by e from ([6]) and finally reach j for 

a large enough which depends on n. This will imply that by defining = + + the 
state y G X is reached with probability at least 

7« ^ e'ieye' > o. 

And this will immediately imply the desired result of Proposition [TTl To this end, we need to 
show existence of t^,t'^,t^ and 9}., 9'^ , O^with properties stated above to complete the proof of 
Proposition [TTl 

First, show the existence of t}., 9j.. For this, note that the Markov chain corresponding to the 
scheduling algorithm has always bounded transition probabilities (since r is bounded in terms 
of 77,) and is irreducible over the space of all independent sets I{G). Therefore, it follows that 
starting from any initial scheduling configuration, there exists finite time t such that a schedule 
is reached so that any given queue i is scheduled for at least unit amount of time with probability 

at least ^ > 0. Here, both t,9 depend on only 77 (and e), not k. Therefore, it follows that in 

-I A ^ A /'A "(v^+l) 

time tj. = (\/k + l)77t all queues become empty with probability at least 9}^ = (9) . Next, 

the existence of t'^,9'^ is also follows from the bounded property of our Markov chain. Finally, 
for t^,9^, consider the interpretation of the Markov chain as in Section [4.21 using the clock ticks. 
Note that no clock ticks in time with probability 9^ > since its rate is bounded in terms of 
77. Hence, the empty set schedule remains in time with probability 9^ > 0, where and 9^ 
depends only on n. This completes the proof of Proposition [T71 □ 

In what follows. Proposition [17] will be used to complete the proof of Lemma [Ml To this end, 
consider Geometric(l/2) as the sampling distribution a, i.e. 

a{i) = l>l. 

Let 6y be the delta distribution on element y G X. Then, define /i as 

= that is /u(-) = 2-^--fkSy{-). 

Clearly, /i is non-trivial measure on (X, i3x). With these definitions of a and /x. Proposition [TTl 
immediately implies that for any x G B,^, 

This establishes that set i?^ is a closed petit set and this completes the proof of Lemma [TH 

6 Throughput &; Fairness of Congestion Control Algorithms 
6.1 Proof of Theorem [3l Rate Stable Congestion Control 

The proof of Theorem [3] is similar to that of Theorem [TJ In a nutshell, the basic idea is to show 
that the update equation ([8]) solves a relevant optimization problem through a subgradient 
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algorithm. That is, A(j),r(j) converge to the solution of the appropriate optimization problem 
with probability 1. The property of the optimization problem will imply the goodness of utility 
of the convergent arrival rates. And, using this convergence property, it will in turn imply rate 
stability of queue-size. 

A relevant optimization problem & its properties. Let M. be space of all probability 
distributions on I{G). Given a distribution ^ £ M, by H£'/j(/i) denote its entropy defined as 

^EnifJ') = - ^ /io-log/io-. 

<TeX(G) 

Consider the following optimization problem. 



maxim 



ize RERifi) + ^{Y1 U^i^i)^ over fi e M, A G [0, 1]^ 



subject to IE/x[o'i] > K, for all i. (77) 

Associate a dual variable rj > to constraint Ep[cjj] > Aj. Here the use of rj for dual variable 
is an intentional abuse of notation and the reason behind this will soon become clear to the 
reader. Given this, the result Lagrangian is given by 

£(/x. A; r) = Her(//) + ^ U,{X^)j + ^ n{E^[ai] - A^) j 

= (RERit^) + Y.riE^[a^]j + (^[PUi{\i) - nX^]^ . (78) 

And, therefore the dual function is given by 

V{r) = sup £(/i,A;r) over G A^, A G [0, 1]". (79) 

Finally, the dual optimization of (|77|) is given by 

minimize ^(r) over r G M" . (80) 

Now we are ready to state useful properties of the optimization problems, (j77p and (j80p . These 
properties were present in earlier work |26j . 

Lemma 18 The optimization problem (j77p is concave maximization while the optimization 
problem (j80p is convex minization. There is no duality gap and hence both same the same 
optimal cost. They satisfy the following properties. 

(1) Given dual feasible r G M", the associate primal feasible assignment /2(r),A(r) are given 
as follows: 

Ha- oc exp(<T-r), for all cr£T{G). (81) 

That is, /x(r) = tt^. And 

Aj(r) = arg max (/3C/j(y) - Tjy) , for all i. (82) 

ye [0,1] 
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(2) The suhgradient for T>{r), represented as g(r) = [gi{r)] is given by 

gi{r) = E^(r)[o-i] - Xi{r). 

(3) And, both problems have unique optimal solutions. 

Proof. To begin with observe that the objective of (|77p is strictly concave as entropy is a strictly 
concave function over Ai and so are Ui for all i under our setup. Therefore, given the constraints 
of ([77|) . the unique optimal exists and is achieved. To observe the lack of duality gap, note that 
there exists a fx & A4 and a A € [0, 1]" that is strictly feasible. Therefore, Slater's condition will 
imply lack of duality gap. We defer the proof of uniqueness of the dual optimal solution till a 
little later. 

Proof of (1). Given the dual feasible r G W]^, the structure of let /^(r), A(r) be the corresponding 
primal feasible solutions that maximize the Lagrangian, C. Given structure of C as in (j78p . it 
follows that A(r) must be such that 

Aj(r) = arg max {(3Ui{y) — rn/) , for all i. 
y6[o,i] 

For /^(r), observe that 

dC{fx,X;r) 

= -log/Lio- - 1 + (T • r. 

Since /^(r) is maximizing C, from above it follows that /icr(r) G (0, 1) for all cr € I{G). Therefore, 
for any cr, p £ 1{G) and <t 7^ p, it must be that 

a£(/i(r),A(r);r) ^ ^^(^(r), A(r); r) 

That is, 

oc exp (<T • r) , for all cr £ 1{G). 

Thus, p(r) = TT^ 

Proof of (2). Given (1), it follows that 

P(r) = £(/2(r),A(r);r). 

Now the dual variables r capture 'slack' in the corresponding constraints of (j77p . Specifically, 
for a given r if the corresponding primal solutions are /i(r),A(r), then the slack in the ith 
constraint is Si(r) — Aj(r): if it is positive, rj should be decreased and if it is negative, ri should 
be increased. This intuition is formalized in the optimization theory (e.g. see book by Boyd and 
Vandenberghe [7|) by establishing that a subgradient of the dual optimization at r is given by 
vector ^(r) G with 

gi{r) = Si(r) - Ai(r). 
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Proof of (3). The uniqueness of solution of (j77p was already explained. To understand unique- 
ness of r*, consider independent set e^, which has only node i in it; and the null set 0. Then, 
since /^(r*) = tt'" it follows that 

Me,(r*) = /ieo(r*)exp(r*). 

Now suppose to contrary that there is another optimal solution of (j80|) . r 7^ r*. Then, it will 
immediately contradict above as fx* is unique as discussed above. This completes the proof of 
(3). 

Convergence of r{j),X{j). In light of Lemma (18^ 2). it follows that the algorithm ([8]) is 
motivated by the standard projected dual subgradient algorithm. The algorithm uses estimated 
s(r(j)) in place of s(r(j)); but exact update for A(r(j)). That is, for all i 

n{j + l) = [ri{J)+a{j){X^{j)-Si{M+■ 
To this end, define 'error' vector 

e(i) = -s(j) + s(r(j)). 

And, let 

d(j) = ||r(i)-r*||i. 

Now consider the relation between d{j + 1) and d{j). Since the projection [•]_)_ is non-expansive, 

d{j + l) < ||r(i)-r* + i[A(i)-s(r(i))+e(j)]||2 

J 

< d{j) + 2[r(i) - r*]^ • i[A(j) - s(r(j)) + e(j)] +o(^ 

J \r 

where we have used the fact that each component of X{j) — s(r(j)) -|- e{j) is 0(1). Define, the 
error in optimal cost at the j'th step as 

AU)=V{r{j))-V{r*). 

By definition, A{j) > 0. Since the dual objective V is convex, and s(r(j))— A(j) is its subgradient 
at r(j), we have 

[r{j) - r*f ■ [X{j) - s{rm < -A{j). (83) 
Also, as used earlier, rj(j) = O(logj) for all i. Therefore, from above we obtain that 

<i(i + l) < ^(i)-^+0( '°^'^'';l'"l°° IKi)lli)+0(-^). (84) 

Note that the analysis of Lemma [TOl applies to bound ||e(j)||i as is. That is, 

E[||e(j)||i] = o(^). (85) 
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Using this and taking expectation on both sides of inequahty (j84p , we obtain 

E[<i(i + l)l < E[dO)]-HE[A(j)]+o( "-"°'^'-;^;,+ l-'l°°> )+o(ji 

Summing the above inequahty from 1 to oo, it foUows that 

< E [d{oo)] 



< E[d(l)]- j;-E[A(j)] +0(n). 



By rearranging the terms and using E [d{l)] < oo, it fohows that Yl'^i i^U)] < °o. Since 



i = oo, we can conclude that 



liminf E[A(j)] = =^ liminf A(j) = 0, with probability 1 

j j 

^ liminf ||r(j) — r*|| = 0, with probability 1, (86) 

j 



where we have used the fact that dual optimization (j80|) has a unique solution and it is convex 
minimization problem. Now, rest of the proof of r(j) — > r*(j) with probability 1 follows exactly 
the same set of arguments as those used in the proof of Theorem [TJ The convergence of A(j) — > 
A(r*) = A follows due to continuity of solution of concave maximization (j82p with respect to r. 

Utility of A, rate stability. To begin with, we observe that convergence r(j) — > r*(j) and 
A(j) — > A(r*) = A with probability 1 implies the rate stability using exactly the same arguments 
as those used in Lemma [T2l 

To establish goodness of the A, note that it along with fj,* optimizes (j77p . Now A*, the 
optimal allocation (as per ([3])) along with appropriate distribution, say u* onT(G)) is a feasible 
solution. Therefore, it follows that 

i i 

< Hsij(/2*)+/?^i7,(A,) 

i 

< log|T(G)|+/?j;C/,(A,). (87) 

i 

In above, we have used the fact that the entropy is non-negative and the maximum value of 
a discrete valued random variable's entropy is at most the logarithm of the cardinality of the 
support set. The (j87p immediately implies the desired result. This completes the proof of 
Theorem [3l 

6.2 Proof of Theorem d 

The proof of Theorem U] in a nutshell requires us to establish that the average rate allocation 
A has near optimal total utility. This follows using similar arguments that we used in proving 
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Theorem [3l That is, estabhsh that the A ends up approximately solving optimization problem 
(j77p . This property follows primarily because the congestion control algorithm 2 with update 
(jlip is primarily designed as a constant step-size dual 'subgradient' algorithm. We will formalize 
this in the rest of this section. We begin with a useful property that establishes uniform bound 
on components of r(-) and subsequently implies uniform bound on the components of the queue- 
size vector Q(-) for all time duration. This will be followed by proof of the goodness of average 
rate A to conclude the proof of Theorem [H 

Uniform bound on ||r(j)||oo. We state and prove the following bound on ||r(j)||oo starting 
with r(0) = 0. 



Lemma 19 Under the update rule (iTT]) . for all 1 < i < n 

ri{j) e [0, (3V + a] , for all j, 
where recall that V is defined in (llOp and a is the constant step-size used in the update ()lip . 

Proof. To prove this Lemma, consider any i,l < i < n. Now for any j, rj(j) > by the definition 
(cf. (jlip ). To prove ri{j) < PV + a, we will use the principle of mathematical induction. To this 
end, for the base case, j = and rj(0) = by definition. Suppose, as the inductive hypothesis 
the property ri{j) < (3V -|- a is true for all j < J. Now we wish to establish this property for 
j = J +1. To this end, we consider two cases: (a) rj(J) < [iV , or (b) rj(J) G {(3V, (3V + a]. 
First consider case (a). By (|lip . it follows that 

ri(J+l) = [ri{J) - asi{J)]^ + a\i{J) 

< ri{J) + aXi{J) 

< ri{J) + a 

< (3V + a. 

In above we have used the fact that Aj( J) G [0, 1] by definition. Now consider case (b). For this 
note that if rj(J) G [(3V, f3V + a], then the Aj(J) = 0. This is because by ([TT]) . Xi{J) solves 

Ai(J) G aig max {pUi{y) - ri{J)y} , 
ye [0,1] 



and for any y G [0, 1] 



^ {(3Ui{y) - r,{J)y) = pu'M - r,{J) 
dy 



< l3V-r^{J) 

< 0. (89) 
That is, the optimal solution of (|88p is 0. This completes the proof of Lemma flUl □ 

Uniform bound on ||Q(j)||oo' We state and prove the following bound on ||Q(j)||oo starting 
with Q(0) = 0. 
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Lemma 20 Under the congestion control algorithm 2, starting with empty queue, i.e. Q(0) = 0, 
the following hold for all t > 0: 

Qi{t) <-{l3V + 2a). 
a 

Proof. In what follows, we will show that for time instances t = jT, for j > 0, the queue-size is 
bounded as 

QiijT) < -ri{j), foralH. (90) 
a 

The (|9Up along with the bound on ri{-) implied by Lemma UM will imply 

QiijT) < -ipV + a), foralH. (91) 
a 

Finally, by noticing that Aj(j) G [0,1] for all i,j, it follows that for any t £ [jT,{j + 1)T), 
Qiit) ^ QiijT) + T. Therefore, we will obtain the desired result of Lemma [20l 
Now we prove the remaining bounded as stated in ()90p . To this end, note that 

Q^iij + 1)T) < [Q,(jT)-Siij)T]^ + Kij)T. (92) 

This follows by imagining that all the arrival traffic in [jT, (j + 1)T), Aj(j)r amount of data, is 
added to the queue at the end of the interval; service Sj(j)r is used only to serve data that was 
present at time jT. 

Based on (|92p . we will establish (j90p . by means of the principle of mathematical induction. 
For the based case of j = 0, we have QiiO) = and rj(0) = 0. For induction hypothesis, assume 
it to hold true for all j < J. For j = J-|- 1, we wish to establish that the relation holds. To this 
end, using ([92|) it follows that 

Qi((J + l)r) < [QiiJT)-SiiJ)T]+ + XiiJ)T 
'T 



< 



a 



-riiJ)-SiiJ)T 



+ KiJ)T 



= ^{[riiJ)-aSiiJ)]_^+a\iJ)) 

= -riiJ+1). (93) 
a 

Here the last equality follows by definition pip . This completes the proof of (j90p and Lemma 
M □ 



A useful variational characterization. We state the Gibbsian variational characterization 
(e.g. see book [TH]) of the distribution tt"" that will be useful later in the proof. 

Lemma 21 Given r £ M", tt"" is the unique solution of 

maximize IE/x[cr • r] + HEnifJ') 

over e M, (94) 
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where recall that M is the space of probability distributions over I{G) . Further, 

¥.„r[cT-r] > maxAT-log|J(G)|. (95) 

Proof. The ()94p was established imphcitly in Lemma [TSJ To see an expUcit proof, consider the 
following. For any fj, £ M, 

E^[a ■ r] + RERifJ') = ^ i^^ ' ^)f^o- - ^ /Xo-log^Ucr 

crGX{G) o-gX(G) 

(tgX(G) <TeX(G) 



logZ(r)+ A^^log^ 



<TeJ(G) 

< logZ(r). (96) 
In above (a) follows from the fact that 

The (b) follows from an application of Jensen's inequality. The above suggests that, the optimal 
cost of ()94p is log Z{r) and is achieved iff the fx = iv^. This establishes the first claim of Lemma 

ED 

To see (j95]) . define fi* as 

f 1 if cr = cr* 
(^0 o.w. 

Here cr* = arg max^gj((;) cr • r. Then, using the above it follows that 

s(r) • r = E^r [a • r] 

> E^* [cr . r] + RERifJ-*) - HERiTv') 

(a) 

> a* •r + 0-log|T(G)| 

maxAT-log|J(G)|. (97) 
AeA 

In above (a) follows from the definition of /x* and the fact that for any distribution on Ai, the 
entropy is at the most log |X(G)|. The (b) follows because any A G A is a convex combination 
of elements in I{G). □ 



Some properties. Here we state some useful properties that will be useful in completing proof 
of Theorem m To begin with, let A* be the optimal solution to congestion control problem 
At any stage j, X{j) is obtained as 

Ai(j) G arg max {l3Ui{y) - ri{j)y}, for all i. 
ye[o,i] 
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Therefore, it follows that for any j 

mUi)) - ri{j)\i{]) > m{K)-nmi (98) 

Since A* G A, we have 

y^Knii) < maxAT(i). (99) 

i 

Define notation m*(r) = max^eA A • r. From (j98]) and ([99|) . we have 

r(j)-A(j) < p(^U,{\{j))^-f)(^U,{K)^+m*{v{j)). (100) 

We will observe another useful property. By Lemma [TOl we have ||r(j)||oo bounded by fiV + a. 
Therefore, using the mixing time bounds and arguments utilized in Lemma [Till we obtain that 
by the choice of appropriately large T as 

T = exp(e(;3nF))e(<^5:±^*^!), (101) 

we have that for all j, 

nU3)\:Fi]-sMm < io(^y + a)n ' 

In above, the conditioning J^j represents the filteration (or information) till time L{j); while 
recall that the random variable Sj(j) is the empirical service rate in [L{j),L{j + 1)). 

Wrapping up: Completing proof of Theorem |4l Now, let us start with the algorithm's 
update rule pT|) . Specifically, for a given i, squaring both sides of ([TT]) for rj(-) gives us 

rf{j + l) = {[r,{j)-as,U)h + o^\U))' 

= h(j) - asi(j)]+ + 2aAi(j) [ri{j) - asi{j)]_^ + a^\j{j) 

(a) 

< - a-Si{j)Y + 2aXi{j)ri{j) + 

< nijf + 2an{3)[h{3) - Si{j)] + 2a\ (103) 

In above (a) follows from the fact that [x]^ < and E [0, 1] for all j; and (b) follows 

from the fact that Si(j) G [0, 1] for all i, j. From (jl03p . we have that 



2a 



na 



2a [r(i) • A(i) - r(j) • s(r(j)) + r(j) • (s(r(i)) - s(i)) + na] .(104) 
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By (HZ]) and since |J(G)| < 2", we have 

- 2a r(j) • s(r(j)) < -2a m*ir{j)) + 2a n. (105) 
Therefore, using (jlOOp we have 



2q r(j) • A(i) - 2q r{j) ■ s(r(j)) < 2a/3 Ui{Xi{j)) - Ui{X*)j + 2an. 



(106) 



Now using (jl06p in (jl04p and fact that < a because a G (0, 1), we have 

L) - rUj)] < 2a/3([/(A(j)) - U{X*)) + 2a r(j) • (s(r(i)) -s{j)) + 4an,(107) 



where we have used notation U{X) = Ui{Xi). Now taking its summation from j = till J — 1 
on both sides of (|107p . the fact that r(0) = and diving both side by J, we have 



i \j=0 



+ 7 (^'^""^^'^ ■ ^'^"(^'^^ "'^^'^^j 



(108) 



Now, define A{j) = 2ar{j) ■ (s(r(j)) - s(j)) and X(j) = A(j) - E[A(j)|jr,]. By definition, 
S{j) = Yli^Q-^^^) ^ martingale with respect to filteration {^j}j>i- With this notation, we 
have that for any J, 

7 (^E2«r(j) • (s(r(i)) -s(j))j = j (^^^^^'^ +lE[A(j)|^,: 

W 1 a Be 

In above (a) follows from (|102p and bound on r(-) using Lemma [T9j Finally, note that S{-) 
is a martingale with bounded increment due to uniform bound on r(-), the fact that s(-),s^(-) 
are vectors in [0,1]" and a G (0,1). Therefore, by Strong Law of Large Large Numbers for 
martingales with bounded increments it follows that 

lim \s{J) = 0, with probability 1. (110) 

That is, with probability 1 

lim sup - I X^2ar(j) • (s(r(i)) - s(j)) | < (111) 

Using (jllip in (jlOSp along with Lemma [T9l and then taking J ^ oo, we have that with proba- 
bility 1, 
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Finally, observe that by concavity of function Ui{-) along with Jensen's inequality, we have 
that for A,(J) = (E/joA.(j))M 

i=o 

Therefore, the following desired conclusion of Theorem [5] follows from (jll2p along with choice 
of /3 = 4re/e: with probability 1, 

lim inf U{\{J)) > U{X*)-e. (113) 

J— +00 

7 Conclusion 

In this paper, we have presented a simple, distributed randomized algorithm for scheduling and 
congestion control in a network. Our algorithm is essentially a random access protocol with 
time- varying access probabilities. Our algorithm for scheduling, in the presence of exogeneous 
arrivals, achieves throughput optimality while our algorithm for scheduling with congestion 
controlled arrivals achieves near-optimal resource allocation when nodes have concave utilities. 
We believe that the algorithmic method presented in this paper should be of general interest. 
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